Skip to content

Conversation

@philip-paul-mueller
Copy link
Contributor

@philip-paul-mueller philip-paul-mueller commented Jan 28, 2026

Updates the way how the set of data that can be moved into an if block is computed. While not strictly adding new capabilities this PR refines the condition to properly handle some condition found in MuPhys.

for oedge in state.out_edges(reloc_node)
if oedge.dst is not if_block
):
nodes_proposed_for_reloc.discard(reloc_node)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why discard() and not remove()? remove() is more strict because it also checks that it exists, right? and the node has to exists in nodes_proposed_for_reloc

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is an interesting point.
If you look at line 680, you see that sometimes multiple nodes are removed in one go, this means because you make a copy you might have already removed that node in a previous iteration, thus the node might be no longer inside.
However, your comment made me think and I now changed it, such that at the beginning, it is checked if the node is still there.
Which is probably nicer and a bit faster, because you perform an early exit.

return branch_nodes

else:
# Only some of the incoming nodes will be moved into the `if`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first part of the comment is misleading, in my opinion. I would write:

Suggested change
# Only some of the incoming nodes will be moved into the `if`
# Only some of the incoming nodes could be moved into the `if`

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are right your suggestion is better.


else:
# Only some of the incoming nodes will be moved into the `if`
# body. This is legal only if the not moved nodes are
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and below:

Suggested change
# body. This is legal only if the not moved nodes are
# body. It would be legal to relocate the node if all incoming nodes were access nodes, this is why access nodes are not included in 'incoming_nodes'.
# In this case, we cannot relocate the node, so we discard also those 'incoming_nodes' that were candidates for relocation.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of like this suggestion, but it is partially a repetition of the comments about input_nodes.
However, I have renamed input_nodes (which is a bad name) to non_mappable_input_nodes which is a much more accurate then the old one and have updated the description by incorporating your suggestions.
I also have refactored it a bit it should be now simpler.

@edopao edopao changed the title feat[dace-next]: Updated MoveDataflowIntoIfBody feat[next-dace]: Updated MoveDataflowIntoIfBody Jan 29, 2026
@philip-paul-mueller
Copy link
Contributor Author

No degradation is observed.
bench_blueline_stencil_compute

@philip-paul-mueller
Copy link
Contributor Author

I did an experiment with MuPhys' Graupel program, with the following results:
The setting was performing 100 iterations using the R2B07_maxFrac grid.
ICON4Py was on muphys_bug_fix_staging (90d9df9c342fa090de0483d00451b17dd8be95df).

The results:

  • Current GT4Py main (27a6a08065): 5.930838584899902
  • TaskletFusion (`d7077175397): 4.147534608840942
  • New MoveDataflowIntoIfBody (83d99f3df453e): 5.8804097175598145
  • TaskletFusion + MoveDataflowIntoIfBody: 4.2691802978515625

It seems that the two feature combined lead to slightly worse performance, but this could also be an artifact.

@iomaganaris
Copy link
Contributor

It seems that the two feature combined lead to slightly worse performance, but this could also be an artifact.

This is a 3% difference which might be just due to the variability of the runtime. We can check though if there's a difference in the number of nodes of the SDFG in total or check a ConditionalBlock to see if we get the expected behavior.
Let me know if you want me to have a look at that

@philip-paul-mueller
Copy link
Contributor Author

@iomaganaris It is probably nothing, but if you have the setup do do a quick check this would be very appreciated.

@philip-paul-mueller philip-paul-mueller marked this pull request as ready for review January 29, 2026 13:15
Copy link
Contributor

@edopao edopao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@iomaganaris iomaganaris left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@philip-paul-mueller philip-paul-mueller merged commit 224cd9a into GridTools:main Jan 30, 2026
22 of 23 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants